Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connected processing to loading the dataset. #11

Merged
merged 6 commits into from
Feb 22, 2024
Merged

Conversation

mooniean
Copy link
Collaborator

@mooniean mooniean commented Feb 12, 2024

added processing when transformations are present.
need to add documentation to explain how it works.

added random seed

Addresses

…ntation to explain how it works. added random seed
@mooniean mooniean added the disk Related to loading from Disk label Feb 12, 2024
@mooniean mooniean self-assigned this Feb 12, 2024
transformations=TRANSFORM_SOME,
)
test_loader.load(datapath=TEST_DATA_MRC, datatype=DATATYPE_MRC)
assert not test_loader.dataset.normalise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a test to make sure the transformation is happening, e.g. that the output dataset (for an example data point) is different if you pass a transformation to passing none.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit addresses this!! I hope that's ok, made a debug flag that doesn't shuffle the paths so we can test the processing is happening.

@@ -127,3 +129,33 @@ def test_get_loader_training_fail():
torch_loader_train, torch_loader_val = test_loader.get_loader(
split_size=1, batch_size=64
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it will be good to test the data loader returning what we want, I think here we have only tested that the class variables are correct (which is great!) but we won't catch things like the data loader not returning the right size of data, or selecting correctly the labels.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a solid point! I checked it but realised it was only on a local function, I'll make that into assertions here

@mooniean
Copy link
Collaborator Author

Took a little while but I think I've addressed everything, including the rescale bugs, testing before/after processing

@crangelsmith
Copy link
Collaborator

Looks good @mooniean! Just don't forget to add docstrings to document variables, but that can be in another PR to not stop progress.

@mooniean mooniean merged commit c6b4672 into main Feb 22, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disk Related to loading from Disk
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants